Dimensionality Reduction by Random Mapping: Fast Similarity Computation for Clustering

نویسندگان

  • Miki Rubinstein
  • Samuel Kaski
  • Tal Hassner
چکیده

When the data vectors are high dimensional it is com putationally infeasible to use data analysis or pattern recognition algorithms which repeatedly compute simi larities or distances in the original data space It is therefore necessary to reduce the dimensionality before for example clustering the data If the dimensionality is very high like in the WEBSOM method which orga nizes textual document collections on a Self Organizing Map then even the commonly used dimensionality re duction methods like the principal component analysis may be too costly It will be demonstrated that the document classi cation accuracy obtained after the di mensionality has been reduced using a random mapping method will be almost as good as the original accuracy if the nal dimensionality is su ciently large about out of In fact it can be shown that the inner product similarity between the mapped vectors follows closely the inner product of the original vectors

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dimensionality Reduction by Random Mapping : Fast

When the data vectors are high-dimensional it is com-putationally infeasible to use data analysis or pattern recognition algorithms which repeatedly compute similarities or distances in the original data space. It is therefore necessary to reduce the dimensionality before, for example, clustering the data. If the dimensionality is very high, like in the WEBSOM method which organizes textual doc...

متن کامل

Recursive nearest agglomeration (ReNA): fast clustering for approximation of structured signals

In this work, we revisit fast dimension reduction approaches, as with random projections and random sampling. Our goal is to summarize the data to decrease computational costs and memory footprint of subsequent analysis. Such dimension reduction can be very efficient when the signals of interest have a strong structure, such as with images. We focus on this setting and investigate feature clust...

متن کامل

Document Clustering: Before and After the Singular Value Decomposition

Document Clustering is an issue of measuring similarity between documents and grouping similar documents together. Information Retrieval (IR) is an issue of comparing query with a collection of documents to locate a set of documents relevant to a particular query. In the vector space IR model, a query is treated as a document which consists of a few terms. Therefore, in both clustering and retr...

متن کامل

Fast Transformation-Invariant Factor Analysis

Dimensionality reduction techniques such as principal component analysis and factor analysis are used to discover a linear mapping between high dimensional data samples and points in a lower dimensional subspace. In [6], Jojic and Frey introduced mixture of transformation-invariant component analyzers (MTCA) that can account for global transformations such as translations and rotations, perform...

متن کامل

A fast and novel technique for color quantization using reduction of color space dimensionality

This paper describes a fast and novel technique for color quantization using reduction of color space dimensionality. The color histogram is repeatedly subdivided into smaller and smaller classes. The colors of each class are projected on a carefully selected line, such that the color dis-similarities are preserved. Instead of using the principal axis of each class, the line is de®ned by the me...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006